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The  Role  of  Order  in  Distributed  Programs 


Kenneth  Birman 
Keith  Marzullo 


Abstract 

We  discuss  the  role  of  order  in  building  distributed  systems.  It  is 
our  belief  that  a  “principle  of  event  ordering"  underlies  the  wide  range 
of  operating  systems  mechanisms  that  have  been  put  forward  for  build¬ 
ing  robust  distributed  software.  Stated  concisely,  this  principle  is  that 
one  achieves  correct  distributed  behavior  by  ordering  classes  of  dis¬ 
tributed  events  that  conflict  with  one  another.  By  focusing  on  order, 
one  can  obtain  simplified  descriptions  and  convincingly  correct  solu¬ 
tions  to  problems  that  might  otherwise  have  looked  extremely  complex. 
Moreover,  we  observe  that  there  are  a  limited  number  of  ways  to  obtain 
order,  and  that  the  choice  made  impacts  greatly  on  performance. 


1  Introduction 

Researchers  have  proposed  a  variety  of  mechanisms  for  building  distributed 
software  within  which  component  programs  cooperate  to  perform  tasks  con¬ 
currently,  maintain  replicated  data,  and  respond  to  failures  or  recoveries  by 
dynamically  reconfiguring.  These  mechanisms  typically  provide  guarantees 
of  “consistent"  (correct)  behavior,  but  the  precise  meaning  of  consistency 
and  the  methods  by  which  consistency  is  achieved  differ  widely.  This  makes 
it  difficult  to  compare  the  different  methods  with  one  another. 

This  paper  is  based  on  the  premise  that  most  forms  of  distributed  con¬ 
sistency  can  be  achieved  by  order  generating  and  preserving  mechanisms. 
While  it  is  not  surprising  that  consistency  should  be  closely  related  to  or¬ 
dering,  we  believe  that  the  fundamental  nature  of  this  relationship  has  not 
been  widely  appreciated.  Here,  we  show  that  the  manner  in  which  order 
is  generated  and  preserved  has  significant  performance  implications,  and 
observe  that  dissimilar  high-level  abstractions  are  often  implemented  using 
surprisingly  similar  ordering  mechanisms.  Abstracted  from  any  particular 
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system,  such  ordering  mechanisms  may  provide  a  suitable  base  for  building 
distributed  operating  systems  and  programming  languages. 

2  Relating  consistency  to  order 

2.1  Reasoning  about  consistency 

To  establish  the  “correctness”  of  a  distributed  system  one  specifies  what  we 
will  call  a  distributed  consistency  property  and  shows  that  it  is  maintained 
during  execution.  Such  a  property  takes  the  form  of  a  predicate  on  the  states 
of  system  components.  Because  the  states  of  components  (and  perhaps  the 
predicate  itself)  may  evolve  during  computation,  one  also  gives  a  rule  telling 
when  it  should  be  satisfied. 

This  raises  two  issues.  The  first  concerns  the  meaning  of  time  in  a  non¬ 
realtime  distributed  system.  Lamport  has  observed  that  for  such  systems, 
any  mechanism  capable  of  ordering  events,  giving  a  way  to  label  them  and 
providing  a  way  to  compare  labels  can  play  the  role  of  time  [Lam78].  In 
fact,  although  it  is  common  to  loosely  treat  time  as  synonymous  with  re¬ 
altime,  there  is  imprecision  in  the  degree  to  which  realtime  clocks  can  be 
synchronized  in  a  network.  Lamport  argues  that  temporal  algorithms  for 
asynchronous  distributed  systems  should  be  based  on  mechanisms  such  as 
logical  clocks,  which  enable  a  program  to  determine  the  order  in  which  events 
occurred,  without  requiring  clock  synchronization.  In  the  case  of  consistency 
predicates  for  asynchronous  settings,  it  follows  that  a  rule  telling  “when”  to 
evaluate  a  predicate  should  be  expressed  in  terms  of  which  events  occur 
before  the  predicate  is  examined  and  which  occur  afterwards. 

A  second  issue  relates  to  “inconsistency”  that  has  no  operational  conse¬ 
quences.  If  a  system  is  consistent  in  all  observable  states,  we  would  argue 
that  it  behaves  correctly  -  even  if  it  passes  through  unobservable  states 
that  violate  consistency.  A  system  can  have  many  sorts  of  “observers”.  The 
states  on  which  operations  are  executed  are  internally  observable.  This  leads 
us  to  require  that  the  distributed  consistency  property  hold  on  the  process 
states  a  distributed  operation  reads  or  modifies.  For  a  system  that  takes 
external  actions,  the  external  world  is  an  observer  of  system  state  -  and 
hence  consistency  should  hold  in  situations  where  such  an  observer  might 
be  able  to  detect  violations. 
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Figure  1:  Execution  of  a  token-passing  algorithm 


2.2  A  token  passing  example 


How  do  these  issues  enter  practical  problems?  Consider  the  problems  of  mu¬ 
tual  exclusion  and  resource  management.  A  simple  mutual  exclusion  scheme 
might  support  two  operations:  acquire  and  release.  A  distributed  system 
can  implement  mutual  exclusion  with  token  passing,  employing  request 
and  pass  operations  for  token  transfer.1  A  correct  execution  will  satisfy 
many  consistency  properties:  that  there  is  at  most  one  holder  of  mutual 
exclusion  at  any  time,  this  holder  previously  did  an  acquire,  and  so  forth. 
Concerning  the  token,  we  would  require  that  the  process  holding  mutual  ex¬ 
clusion  also  holds  the  token,  that  there  is  always  exactly  one  token  holding 
process,  etc. 

Now,  consider  the  execution  in  Fig.l,  where  q  initially  holds  the  token 
and  passes  it  to  p  in  response  to  a  request.  There  are  many  ways  to  examine 
the  state  of  this  system.  If  we  look  at  p  at  time  Ti  and  q  at  time  X3,  we 
would  find  two  tokens,  and  if  we  looked  at  both  at  time  T2  we  would  find 
none.  Obviously,  the  former  “state”  is  observed  in  an  unreasonable  way:  one 
should  examine  system  states  at  the  same  time.  On  the  other  hand,  in  the 
absence  of  precisely  synchronized  realtime  clocks,  there  is  no  way  to  take 
a  simultaneous  snapshot:  times  T\  and  T2  may  well  be  indistinguishable 
to  processes  p  and  q.  Now  consider  the  latter  system  state.  This  seems 
to  expose  a  weakness  in  our  consistency  rule,  which  fails  to  address  the 
case  where  the  token  is  in  transit  when  we  look  at  the  system.  Yet,  it  is 

1  In  systems  that  support  lightweight  tasks,  a  process  holding  the  token  could  use  any 
correct  scheme  for  “local”  mutual  exclusion. 
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reasonable  to  claim  that  unless  such  a  state  is  observable  the  consistency 
rule  shouldn’t  need  to  cover  it. 

One  solution  to  this  class  of  problems  is  to  examine  only  consistent 
cuts  [CL85].  A  consistent  cut  is  any  sampling  of  process  states  such  that  if 
q  is  examined  after  receipt  of  message  m  from  p,  then  p  must  be  examined 
subsequent  to  sending  m.  (A  consistent  snapshot  is  a  consistent  cut  in  which 
channel  contents  are  also  recorded).  Consistent  cuts  eliminate  the  first  of 
the  above  problems,  but  not  the  second:  we  would  still  need  to  extend  the 
consistency  rule  to  allow  the  case  where  the  token  is  in  transit  (and  hence 
there  is  no  holder). 

2.3  A  resource  management  example 

This  specific  problem  of  messages  in  transit  is  symptomatic  of  a  larger  class 
of  problems  where  a  complex  operation  violates  consistency  while  it  is  in 
progress.  Consider  a  set  of  resource  manager  processes  that  cooperate  to 
control  a  set  of  resources  accessed  by  application  processes.  Overloaded 
managers  can  transfer  responsibility  for  a  resource  by  sending  a  transfer 
message  that  informs  all  processes  in  the  system  of  the  new  manager.  Until 
the  transfer  occurs,  a  manager  continues  to  handle  new  service  requests; 
after  the  transfer  it  rejects  requests,  which  the  client  retransmits  to  the  new 
manager.  The  consistency  rule  could  be  that  each  resource  is  managed  by  a 
unique  process  at  any  given  time,  and  that  during  a  period  when  a  resource 
is  transferred  t  times,  no  request  ever  needs  to  be  retried  more  than  t  +  1 
times. 

This  system  can  be  easily  built  using  multicast  primitives  to  implement 
the  transfers,  such  as  the  ones  the  ISIS  system  supports  [BJ87b,BJ87a]. 
Requests  can  then  be  performed  using  conventional  RPC.  However,  such 
an  implementation  will  violate  the  consistency  predicate  along  many  pos¬ 
sible  consistent  cuts.  Consider  the  multicast  message  corresponding  to  a 
TRANSFER.  This  cannot  be  delivered  instantaneously,  hence  there  will  be 
cuts  in  which  some  processes  have  received  the  message  and  some  have  not. 
Worse,  some  multicast  protocols  transfer  messages  to  their  destinations  in 
a  first  phase  but  delay  delivery  until  some  other  event  takes  place.  For 
such  a  protocol,  there  will  be  consistent  cuts  in  which  all  messages  have 
arrived,  but  some  have  not  been  delivered.  Short  of  devising  a  very  compli¬ 
cated  consistency  predicate  or  modifying  the  consistent  cut  algorithm  to  be 
knowledgeable  about  the  way  the  protocol  works,  the  only  thing  one  can  say 
is  that  consistency  will  hold  for  those  cuts  reflecting  all-or-nothing  message 
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delivery.  The  same  issue  would  have  arisen  in  the  token  passing  problem  if 
tokens  were  passed  using  a  multi-phase  commit  protocol. 

2.4  A  conjecture  about  consistency  and  ordering 

Is  there  a  way  to  formulate  an  arbitrary  problem  so  that  it  is  clear  when 
a  consistency  property  should  hold?  A  consistency  property  is  a  predicate 
that  refer  to  a  distributed  state.  Complex  operations,  such  as  the  transfer 
operation  described  above  also  refers  to  a  distributed  state.  Clearly,  some 
precondition  over  the  distributed  state  should  hold  prior  to  doing  these  sorts 
of  operations  too.  Let  us  refer  to  distributed  operations  like  these  as  meta- 
operations. 

We  conjecture  that: 

Any  system  concerned,  with  maintaining  consistency  must  impose 
and  respect  ordering  when  meta- operations  conflict. 

Two  issues  arise.  One  is  how  to  tell  when  meta-events  conflict  and  need 
to  be  ordered.  This  question  is  highly  dependent  on  semantics,  and  lies 
outside  the  scope  of  this  paper.  The  second  concerns  how  to  impose  and 
respect  ordering  when  i  ecessary. 

Our  conjecture  sounds  like  a  serializability  constraint  on  meta-operations. 
However,  notice  that  that  except  for  its  use  of  ordering,  the  problem  has  lit¬ 
tle  in  common  with  transactions  that  read  and  write  a  shared  database  using 
concurrency  control  and  multi-phase  commit  protocols,  which  is  the  tradi¬ 
tional  context  for  serializability  arguments.  Nor  are  we  trying  arguing  for 
totally  ordered  multicast  protocols  or  the  ISIS  virtual  synchrony  property, 
although  this  paper  is  certainly  motivated  by  the  latter.  Our  goal  is  to 
understand  when  one  needs  to  pay  for  the  ordering  that  protocols  such  as 
these  provide;  to  use  them  without  regard  for  need  would  be  a  costly  propo¬ 
sition.  Also,  multicasting  arises  in  only  a  subset  of  distributed  systems.  By 
abstracting  away  from  message  passing  per-se,  one  arrives  at  more  general 
results.  Moreover,  multicast  based  systems  typically  treat  each  multicast 
as  a  separate  event.  It  is  unclear  that  results  derived  for  such  a  setting 
apply  when  such  tactics  as  piggybacking  one  message  on  top  of  another  are 
permissible. 

Let  us  summarize  our  observations: 

1.  One  achieves  consistency  in  a  distributed  system  by  ordering  events 
that  conflict  with  one  another. 
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2.  These  events  involve  sets  of  operations  that  occur  in  multiple  processes 
but  are  nonetheless  related. 

3.  The  type  of  conflict-based  ordering  needed  to  ensure  consistency  is 
more  primitive  than  transactional  serializability  or  virtual  synchrony. 

The  remainder  of  this  paper  starts  by  making  the  ideas  of  meta-events 
and  meta-event  orderings  more  precise,  asking  how  meta-event  orderings 
come  about,  and  exploring  the  cost  implications  that  ordering  may  have 
at  the  application  level.  Then,  we  look  at  how  order  is  used  in  a  variety 
of  distributed  computing  systems.  The  paper  concludes  by  asking  what 
the  implications  of  a  principle  of  distributed  ordering  might  be  for  future 
distributed  systems  design  efforts. 

It  is  not  our  goal  here  to  provide  a  formally  rigorous  treatment  of  meta¬ 
events  and  order.  We  will  also  only  touch  on  issues  of  fault- tolerance  and 
realtime. 

3  Meta-event  orderings 

In  this  section,  we  introduce  an  event-ordering  model  that  includes  meta¬ 
events  and  discuss  what  it  means  for  two  meta-events  to  be  ordered. 

3.1  Events  and  Event  Orderings 

We  will  start  with  the  standard  partial-order  representation  of  a  distributed 
system.  At  the  most  primitive  level  of  a  distributed  system,  only  certain 
types  of  events  can  be  said  to  be  “ordered”.  Consider  a  system  of  processes 
that  communicate  using  messages.2  Processes  execute  operations ,  which  are 
indivisible  units  of  work:  computation,  sending  a  message,  or  receiving  a 
message.  We  use  the  term  event  to  denote  any  of  these  activities. 

Event  orderings  come  about  when  an  operation  is  influenced  by  (i.e. 
reads  from)  some  prior  operation,  or  when  the  logic  of  the  program  delays 
the  start  of  one  operation  until  another  has  finished,  or  when  a  message 
sent  by  one  process  is  received  in  another.  It  follows  that  if  one  stops  the 
execution  of  the  system  at  some  instant  in  time,  the  execution  up  to  that 
point  can  be  described  by  a  tuple  (P,E,— ►  ),  where  P  is  a  set  of  processes,  E 
is  a  set  of  events,  and  — »  gives  the  order  in  which  the  events  occurred. 

2The  ability  to  share  memory  between  processes  doesn’t  change  things  in  a  fundamental 
way,  but  it  would  introduce  complexity. 
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The  partial  order  — ►  is  actually  defined  as  the  transitive  closure  of  two 
more  primitive  partial  orders: 

1.  The  internal  ordering  on  operations ,  defined  on  a  per-process  basis. 
For  events  a  and  b  occurring  in  some  process  p,  a  —>  b  denotes  that  a 
read  from  b,  or  was  constrained  by  the  logic  of  the  program  to  execute 
after  b. 

2.  The  communication  ordering ,  defined  on  a  per-message  basis.  For 
message  m  sent  from  process  p  to  q,  sndp(m)  — ►  rcvq(m). 

3.2  Meta  events  and  meta  orderings 

Earlier,  we  discussed  the  idea  of  a  meta-operation,  the  execution  of  which 
gives  rise  to  a  meta-event.  A  meta-event  is  a  set  of  logically  related  events 
at  multiple  processes.  Examples  of  meta-events  include  the  delivery  of  a 
multicast  message  to  some  group  of  destination  processes,  the  creation  of 
a  snapshot  of  the  distributed  state  of  a  system,  the  detection  of  a  process 
failure  by  the  processes  that  survived,  or  a  transaction  on  a  database. 

We  model  a  meta-event  by  a  tuple  (i,  M )  where  i  is  an  initiating  event 
and  M  is  a  set  of  events  satisfying  Vm  6  M  :  i  — ►  m.  We  will  say  that  two 
meta-events  are  ordered  if  their  event  sets  are  ordered: 

( i ,  M)  — ►  (j,  N )  iff  Vm  G  M,n  G  N  :  m  — >  n. 

No  statement  is  made  about  the  ordering  of  the  initiation  events;  we  want  to 
allow  the  case  where  two  meta-events  are  started  concurrently  but  ultimately 
give  ordered  outcomes. 

Notice  that  a  meta-event  need  not  correspond  to  an  invocation  of  a 
multicast  or  some  other  communication  protocol.  That  is,  we  do  not  require 
here  that  the  meta-event  be  a  discrete  activity  separable  from  the  rest  of 
the  system’s  execution.  The  communication  that  links  an  initiating  event 
to  the  outcome  events  could  be  hidden  in  any  of  the  mechanisms  by  which 
information  is  transmissible  within  a  distributed  system. 

It  may  appear  that  failure  detection  by  timeout  and  the  detection  of 
“external”  events  through  sensors  give  rise  to  meta-events  that  lack  a  single 
initiating  event.  However,  if  multiple  processes  try  to  do  these  things  in 
parallel,  they  may  not  all  observe  the  same  outcome.  For  example,  one  pro¬ 
cess  might  see  an  overloaded  process  timeout,  while  other  processes  believe 
that  it  remained  operational.  If  internal  consistency  is  needed,  a  software 
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agreement  protocol  would  have  to  be  executed.  This  converts  the  physical 
timeout  events  to  logical  ones,  which  fit  our  rule. 

3.3  Origins  of  meta-order 

Imagine  a  distributed  system  in  which  process  p  wishes  to  initiate  meta¬ 
operation  ( i,M ).  Say  that  this  operation  should  be  ordered  with  respect  to 
certain  types  of  meta-events.  The  question  is  how  p  can  achieve  this  ordering 
and  how  it  is  conveyed  to  the  processes  executing  the  events  m  6  M .  There 
are  two  static  cases  and  one  dynamic  case: 

Computational  ordering:  The  first  static  case  is  when  the  meta-events 
that  should  be  ordered  are  initiated  within  a  single  computation.  That  is, 
p  “knows”  about  previously  initiated  meta-events  which  should  terminate 
first  at  any  destinations  where  they  overlap  with  M.  Say  that  (i',  M')  is  such 
a  prior  meta-event.  Then  it  must  be  the  case  that  i'  — ►  t,  because  p  could 
not  otherwise  know  about  (i\  M').  Since  — ►  is  transitive,  Vm  G  M,  i'  — >  m. 
In  other  words,  there  is  a  way  to  pass  information  about  events  prior  to  M 
to  the  places  where  events  in  M  will  occur.  As  we  will  see  below,  this  can 
be  exploited  in  different  ways.  but.  the  essential  point  is  that  the  system  has 
the  ordering  information  it  needs  and  has  a  way  to  get  it  to  where  it  will  be 
needed.  No  additional  messages  are  needed,  although  some  messages  will  to 
be  larger  since  they  need  to  carry  some  representation  of  this  information 
to  the  places  where  it  will  be  used. 

Locking  schemes  also  fit  into  the  computational  paradigm.  In  these,  the 
“lock  manager”  delays  granting  a  lock  until  after  it  has  been  released  by 
prior  holders,  establishing  a  causal  relationship  in  which  the  new  holder’s 
actions  occur  after  the  previous  holder’s  release. 

A  priori  ordering:  The  second  static  case  occurs  when  p  knows  about 
some  event  that  will  take  place  elsewhere,  but  was  not  initiated  prior  to 
i.  That  is,  the  semantics  of  the  operations  include  the  requirement  that 
Vm  G  M,  m'  G  M',  m  — >  m'.  For  example,  p’s  operation  may  be  in  response 
to  an  earlier  operation  that  concurrently  started  an  operation  on  q,  and  we 
want  the  result  of  q's  operation  to  be  ordered  before  that  of  p’s.  Again,  p 
can  pass  whatever  information  it  has  to  the  places  where  the  events  m  G  M 
will  take  place.  Presumably,  the  processes  that  receive  this  information  can 
wait  if  necessary.  (Otherwise,  p  will  need  to  wait  until  q's  event  has  occurred 
before  initiating  (t,  M),  but  this  takes  us  back  to  the  computational  case). 
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Dynamic  ordering:  The  third  case  is  the  hardest.  Here,  p  does  not  know 
if  other  meta-events  might  be  initiated  elsewhere,  concurrent  with  (t,  M). 
If  some  other  process  has  initiated  a  meta-event  that  conflicts  with  (*,Af), 
the  two  events  should  be  ordered,  although  the  order  will  not  be  known  a 
priori.  On  the  other  hand,  if  no  “conflicting”  event  is  present,  ( i ,  M)  should 
simply  be  allowed  to  occur.  In  the  dynamic  case,  additional  information  is 
needed  before  the  events  in  M  can  take  place.  As  we  will  see  below,  the  cost 
of  this  information  (the  communication  needed  to  obtain  it)  is  significantly 
higher  than  in  the  two  static  cases. 

4  The  cost  of  dynamic  ordering 

How  much  does  dynamic  order  cost?  This  problem  is  well  known  in  dis¬ 
tributed  systems. 

An  extreme  case  of  dynamic  ordering  arises  in  algorithms  for  comput¬ 
ing  a  consistent  snapshot.  An  example  is  the  method  of  Chandy  and  Lam¬ 
port  [CL85],  which  flushes  messages  from  the  communication  channels  using 
marker  messages;  the  channels  are  assumed  FIFO,  hence  messages  sent  prior 
to  the  snapshot  are  received  prior  to  the  markers.  The  cost  is  high:  a  marker 
message  flows  in  each  direction  between  each  pair  of  processes.  Moreover, 
the  only  meta-events  in  the  basic  algorithm  are  those  related  to  forming 
snapshots.  That  is,  the  snapshots  are  ordered  relative  to  base  events,  but 
might  cut  through  other  types  of  meta-events  used  in  the  application.  The 
cost  of  forming  a  snapshot  ordered  relative  to  other  sorts  of  meta-events 
could  be  even  higher. 

Dynamic  ordering  also  arises  in  multicast  protocols  that  provide  ordered 
message  delivery  and  in  protocols  for  distributed  mutual  exclusion.  Here, 
we  will  focus  on  the  multicast  case.  Ignoring  failures,  a  multicast  can  be 
modeled  as  a  meta-event  (i,  M)  in  which  i  denotes  the  initiation  of  the  mul¬ 
ticast  and  M  the  set  of  message  delivery  events.  Existing  atomic  multicast 
protocols  achieve  delivery  ordering  in  one  of  two  ways,  and  mutual  exclusion 
protocols  can  generally  be  classified  into  the  same  two  categories. 

One  approach  is  to  implement  a  rule  by  which  for  each  pair  of  con¬ 
current  multicasts,  some  uniquely  identified  process  must  pick  their  order¬ 
ing.  For  example,  this  can  been  done  using  token  passing  (the  token  holder 
picks)  [CM87],  or  with  a  tree-structured  scheme  (the  least  common  ancestor 
picks)  [GMS88].  In  sufficiently  static  settings,  this  can  also  be  done  with  a 
hashing  scheme  [CG86]. 


The  other  approach  is  decentralized.  In  this  scheme,  the  processes  vote 
on  the  ordering  they  prefer,  and  any  process  with  the  full  set  of  votes  can 
deduce  the  ordering  rule  to  use.  A  two- phase  protocol  based  on  this  approach 
is  described  in  [B  J87b].  The  “state  machine”  approach  is  similar;  it  employs 
a  fully  decentralized,  fault-tolerant  protocol  to  achieve  ordering  [Sch86j. 

There  are  ordered  broadcast  protocols  that  utilize  external  sources  of 
order.  We  would  classify  these  as  instances  of  the  first  case,  where  the  order 
generating  “process”  is  implemented  in  hardware,  or  hidden  within  a  clock 
synchronization  algorithm.  For  example,  the  Linda  S/Net  implementation 
exploited  broadcast  communication  hardware  for  this  purpose  [CG86],  Like¬ 
wise,  the  Delta-T  protocols  operate  using  synchronized  clocks  and  realtime 
timestamps;  they  delay  message  delivery  long  enough  to  compensate  for 
uncertainties  in  clock  synchronization  and  transmission  time  [CAS86]. 

There  are  also  ordered  broadcast  protocols  that  combine  some  of  the 
work  done  for  protocol  invocation  t  with  that  for  invocation  t  +  1;  for  exam¬ 
ple,  the  ISIS  failure  detection  protocol  [BJ87b].  Such  a  strategy  may  reduce 
the  overall  cost  of  the  protocol,  but  does  not  constitute  a  fundamentally  new 
way  of  obtaining  ordering. 

Notice  that  dynamic  order  comes  at  a  high  price.  In  addition  to  the  extra 
communication  required  relative  to  the  two  static  cases,  dynamic  ordering 
requires  that  events  be  delayed,  and  this  latency  can  directly  impact  the 
application  program.  That  is,  there  is  always  some  process  which  prior 
to  some  point  in  its  execution  will  not  be  able  to“safely”  perform  certain 
actions.  It  must  wait  to  obtain  additional  information,  and  this  latency 
limits  the  rate  at  which  the  application  can  make  progress. 

Thus,  the  choice  of  ordering  method  can  have  important  practical  per¬ 
formance  implications.  We  will  see  an  example  of  this  in  section  5. 

5  Two  token  passing  implementations 

A  brief  example  will  illustrate  how  ordering  issues  can  enter  into  a  higher- 
level  algorithm.  Our  goal  is  to  implement  the  token  passing  part  of  the 
mutual  exclusion  scheme  described  above  using  a  static  set  of  ISIS  processes 
(the  solution  can  easily  be  generalized  to  a  dynamic  set,  but  we  will  not  do 
so  here).  The  problem  was  first  solved  by  Schmuck  [Sch88];  the  treatment 
given  here  follows  one  in  [BJ89]. 

We  need  a  some  detail  about  two  multicast  primitives  supported  by  ISIS. 
CBCAST  and  ABCAST. 
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•  CBCAST  ensures  that  if  there  are  two  CBCASTs  satisfying  i  — *  i', 
then  delivery  order  matches  the  invocation  order:  ( i,M )  — » 

•  ABCAST  extends  CBCAST  by  also  ordering  concurrent  invocations, 
picking  an  order  to  use  in  the  concurrent  case. 

5.1  ABCAST  solution 

Using  ABCAST  we  can  implement  a  very  simple  token  passing  algorithm 
(Fig.  2).  All  operations  (PASS  and  REQUEST  with  no  parameters)  are  multi¬ 
cast  to  the  entire  set  of  processes.  Each  process  maintains  a  list  of  pending 
operations.  All  request  operations  are  granted  in  a  deterministic  order, 
and  a  REQUEST  is  granted  when  a  PASS  is  received.  Since  all  see  the  same 
operations  in  the  same  order,  behavior  is  identical. 

5.2  CBCAST  solution 

An  alternative  token  passing  scheme  multicasts  operations  using  CBCAST 
(Fig.  3.  A  consequence  is  that  processes  may  receive  requests  in  different 
orders,  and  that  the  request  lists  at  two  different  processes  may  not  contain 
the  same  requests  when  a  given  PASS  operation  is  received.  In  this  imple¬ 
mentation,  the  process  doing  a  PASS  operation  first  picks  the  request  that  it 
will  grant  (e.g.  the  first  one  on  its  list  of  pending  requests),  and  includes  this 
information  as  part  of  the  multicast  (delaying  the  PASS  in  the  case  where 
there  are  no  pending  requests).  Processes  receiving  the  pass  operation  must 
look  up  the  granted  request  on  their  list  of  pending  requests  (it  is  easy  to 
show  that  they  will  find  it  there)  and  delete  it.  This  algorithm  is  proved 
correct  in  [BJ89],  using  the  observation  that  request  and  pass  operations 
are  totally  ordered  by  — »  along  the  path  that  the  token  follows. 

5.3  Use  of  order  in  the  two  algorithms 

How  would  one  pick  between  these  two  solutions  to  the  problem?  One  is 
easier  to  understand  than  the  other,  but  the  the  difference  in  performance  far 
outweighs  any  difference  in  complexity.  The  ABCAST  algorithm  is  “tightly 
synchronous”:  all  processes  move  in  lock-step,  and  computation  advances 
slowly  because  of  the  costs  intrinsic  to  ABCAST.  The  CBCAST  version  is 
much  faster:  all  the  multicasts  can  be  done  asynchronously,  in  which  case 
computation  will  be  limited  only  by  processor  speed  and  the  capacity  of 
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the  operating  system  to  buffer  multicast  requests  and  perform  them  in  the 
background.  Using  ISIS,  these  algorithms  can  be  compared  experimentally: 
for  5  processes  running  on  SUN  3/60  hardware,  the  performance  difference 
exceeds  a  factor  of  10,  and  this  grows  with  the  number  of  processes. 

From  the  perspective  of  ordering,  the  algorithms  differ  in  the  way  that 
they  obtain  the  ordering  needed  to  maintain  consistency.  The  former  ignores 
the  invocation  order  of  the  operations.  It  acts  as  if  all  operations  potentially 
conflict  with  one  another,  and  resolves  this  by  generating  a  strong,  globally 
observed  ordering  that  it  uses  to  control  execution.  The  CBCAST  version 
is  more  cautious  in  its  use  of  available  ordering.  By  having  the  process  that 
is  about  to  do  a  pass  decide  what  request  to  grant,  consistent  distributed 
behavior  is  achieved  without  ever  resorting  to  a  costly  ABCAST,  and  the 
ordering  problem  is  reduced  to  the  computational  case. 

We  believe  this  example  is  demonstrative  of  the  general  problem.  When 
we  build  distributed  systems  without  attention  to  the  amount  of  ordering 
needed  to  achieve  consistency,  and  the  ways  that  ordering  can  be  preserved, 
we  can  find  ourselves  using  distributed  programs  as  inefficient  and  “inele¬ 
gant”  as  the  ABCAST  token  passing  algorithm. 

6  Completeness  of  the  model. 

Could  there  not  exist  many  levels  of  ordering,  like  the  ones  that  CBCAST 
and  ABCAST  provide,  but  differing  in  the  precise  rule  that  they  implement? 
In  the  case  of  multicasts,  one  can  prove  that  these  two  types  of  ordering  are 
complete  (in  the  sense  that  these  can  implement  any  other  ordered  prim¬ 
itive)  within  the  problem  classes  that  they  solve.  Schmuck  does  this  for 
CBCAST  [Sch88],  and  Schneider  discusses  work  on  the  state  machine  ap¬ 
proach  which  includes  type  of  ordering  achieved  by  ABCAST  [Sch86],  The 
same  results  can  be  expected  to  hold  in  the  case  of  meta-event  orderings. 

This  is  not  to  say  that  more  complex  forms  of  ordering  are  not  mean¬ 
ingful.  In  fact,  if  one  moves  to  systems  that  require  ordering  on  sets  of 
meta-events,  more  complex  algorithms  are  definitely  needed.  For  example, 
the  ISIS  system  implements  a  third  multicasting  primitive,  GBCAST,  which 
it  uses  for  process  group  membership  changes.  GBCAST  provides  ordering 
with  respect  to  more  than  one  class  of  meta-events,  and  requires  a  more 
costly  3-phase  protocol. 

An  interesting  direction  to  pursue  would  be  a  theory  about  the  compo¬ 
sition  of  meta-operations  and  meta-orders.  This  is  discussed  more  in  the 
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conclusions  of  this  paper. 


7  Higher  level  consistency  and  ordering. 

So  far,  our  examples  have  been  low-level  ones.  Most  distributed  computing 
mechanisms,  in  contrast,  exist  at  a  very  high  level:  reliable  multicasts,  trans¬ 
actions  and  2-phase  commit,  the  ISIS  virtually  synchronous  toolkit,  virtual 
time,  the  Linda  tuple-space,  the  Psync  primitives  [Pet87],  and  so  forth.  Our 
task  in  this  section  will  be  to  bridge  the  gap.  In  so  doing,  the  relationships 
between  these  systems  will  become  more  clear. 

7.1  Who  is  currently  operational? 

Agreement  on  failure  is  perhaps  the  most  fundamental  problem  in  a  dis¬ 
tributed  system.  Even  the  simplest  real  distributed  systems  must  address 
this  issue.  For  example,  in  a  system  with  two  processors  connected  by  a 
single  communication  link,  usually  two  kinds  of  failures  are  assumed:  pro¬ 
cessors  may  crash,  and  the  link  may  suffer  from  a  performance  failure  (t'.e. 
it  may  delay  or  drop  a  message).  These  two  kinds  of  failures  are  indistin¬ 
guishable  by  a  sending  process,  yet  the  results  of  an  inconsistent  decision 
can  be  disastrous  [Gra79].  As  a  result,  several  existing  systems  structure 
their  state  such  that  the  recovery  from  both  kinds  of  failures  are  identical. 
For  example,  in  the  Sun  NFS  protocol  [SUN86]  a  connection  holds  very 
little  state,  so  a  crash  can  be  treated  as  a  particular  kind  of  performance 
failure.  Many  other  protocols  “time  out”,  treating  performance  failures  as 
crash  failures. 

When  more  than  two  processes  are  involved,  substantial  additional  com¬ 
plexity  arises.  For  example,  transactional  concurrency  control  mechanisms 
impose  order  on  read  and  write  operations  so  that  the  execution  of  a  set  of 
concurrent  transactions  is  equivalent  to  a  serial  execution.  The  best  known 
such  mechanism  for  managing  replicating  data  is  the  available  copies  al¬ 
gorithm.  It  has  been  shown  that  if  all  processes  agree  on  when  a  process 
fails,  serializability  is  maintained.  If  they  do  not  agree,  then  a  transaction 
may  not  see  a  the  effects  of  a  conflicting  operation  from  a  virtually  earlier 
transaction  [BHG87].  Thus,  transactional  concurrency  control  mechanisms 
must  order  failure  events  relative  to  the  execution  of  other  operations  - 
specifically,  commit  operations. 

The  ambiguity  of  crash  and  performance  failures  makes  agreement  on 
who  is  operational  difficult.  If  the  effect  of  executing  an  operation  can  be 
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undone,  as  an  abort  of  a  transaction,  then  a  good  way  to  deal  with  apparent 
failures  is  simply  to  cause  an  abort.  This  is  simple  and  conservative:  if  the 
failure  was  not  real,  the  only  cost  is  that  the  transaction  must  be  redone.  On 
the  other  hand,  abort  is  not  always  meaningful;  for  example,  in  systems  that 
take  external  actions.  Lacking  this  alternative,  the  operational  processes 
must  instead  agree  on  which  processes  have  crashed,  forcing  those  processes 
to  crash  or  rejoin  the  system  if  it  later  turns  out  that  the  problem  was  a 
performance  failure. 

The  protocols  for  agreeing  on  who  is  operational  are  clearly  an  important 
part  of  any  distributed  system.  Moreover,  such  protocols  bear  a  strong  re¬ 
lationship  to  the  ordering  and  consistency  preserving  mechanisms  discussed 
above. 

To  see  this,  consider  an  application  that  makes  use  of  a  list  of  the  oper¬ 
ational  processes.  At  some  point  in  an  execution,  process  p  goes  from  being 
operational  to  having  failed.  An  application  that  depends  on  this  informa¬ 
tion  may  dynamically  adapt  itself  to  the  failure  of  p,  and  it  then  becomes 
important  that  events  initiated  after  the  failure  only  encounter  processes  at 
which  the  failure  is  already  known.  For  example,  if  q  sends  s  a  message  that 
relates  to  the  failure  of  p,  inconsistency  could  easily  arise  if  s  receives  the 
message  prior  to  observing  the  failure.  Any  messages  ordered  prior  to  the 
observation  of  the  failure  of  p  must  be  flushed  from  the  channels,  effectively 
forming  a  snapshot.  A  protocol  capable  of  achieving  this  handling  of  process 
failures  necessarily  creates  meta-order. 

It  is  not  surprising  to  find  that  the  ISIS  system  solves  this  problem  using 
a  multiphase  consensus  protocol  that  terminates  in  two  phases  after  the  last 
failure  [BJ87b],  During  the  last  phase,  this  protocol  does  a  flush,  much  like 
the  transmission  of  channel  markers  that  occurs  in  the  Chandy-Lamport 
consistent  snapshot  algorithm  [CL85].  The  ISIS  protocol  can  thus  be  under¬ 
stood  as  a  mechanism  for  drawing  a  line  (cut)  across  the  system  execution: 
events  prior  to  the  cut  have  not  observed  the  failure,  all  processes  observe 
the  failure  “simultaneously”  along  the  cut,  and  events  after  the  cut  all  reflect 
the  failure  event.  In  other  words,  the  ISIS  solution  works  by  establishing 
meta-ordering.  Similar  mechanisms  appear  in  other  systems  ([CM87],  for 
example).  We  would  argue  that  while  these  protocols  are  necessary,  they 
have  for  the  most  part  not  been  well  presented  and  understood.  This  seems 
to  be  a  problem  for  which  an  order- based  treatment  could  lead  to  significant 
simplification.  For  example,  Cristian’s  solution  to  the  membership  problem, 
in  [Cri88],  is  notable  for  a  specification  that  uses  ordering  properties  and  for 
the  simplicity  of  the  algorithms  proposed. 
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7.2  Creating  and  Preserving  Order 

According  to  the  thesis  of  this  paper,  consistency-preserving  distributed 
mechanisms  should  fall  into  two  categories:  those  where  the  meta-operations 
are  ordered  non-dynamically,  and  those  where  the  meta-operations  are  or¬ 
dered  dynamically.  In  fact,  it  was  this  observation  that  led  us  to  investigate 
ways  meta-operations  can  be  ordered. 

An  example  of  the  use  of  a  priori  ordering  arises  in  timestamp-based 
concurrency  control  algorithms  [BHG87].  When  created,  a  transaction  is 
assigned  a  timestamp  from  a  total  order,  and  the  concurrency  control  algo¬ 
rithm  can  delay  or  abort  a  transaction  if  it  attempts  to  access  a  variable  in 
an  inconsistent  order.  Liskov  and  Radkin  use  a  similar  method  in  their  work 
on  highly  available  servers  [LL86].  A  2-phase  locking  algorithm,  in  contrast, 
works  by  dynamically  ordering  meta-operations.  Any  transaction  is  ordered 
either  before  the  lock  point  of  another  transaction,  or  after  its  commit  point. 
The  meta-operations  in  this  case  are  the  read,  write  and  commit  operations, 
and  perhaps  the  write-lock  operation  if  locks  are  replicated. 

Quorum-based  replicated  data  algorithms  are  an  example  where  one  can 
see  the  relationship  between  what  we  have  called  dynamic  order  and  com¬ 
putational  order.  Viewed  externally,  a  quorum  write  is  clearly  a  case  of 
dynamic  order,  generated  using  a  decentralized  scheme.  Now,  consider  the 
same  operation  at  a  microscopic  level.  Typically,  the  write  will  require  two 
phases:  a  first  phase  during  which  the  value  to  be  written  is  distributed 
and  the  version  number  to  be  used  is  computed,  followed  by  a  second  phase 
during  which  the  update  is  committed  provided  that  a  quorum  of  responses 
was  received  and  aborted  otherwise.  At  this  level  of  abstraction,  the  first 
phase  determines  an  order  using  a  decentralized  rule,  and  the  second  phase 
respects  that  order.  That  is,  the  second  phase  is  computationally  ordered 
in  the  sense  discussed  above. 

Given  an  order,  there  are  many  distributed  mechanisms  for  preserving 
it.  The  pessimistic  schemes  preserve  order  at  all  times.  In  the  case  of 
transactional  systems,  a  pessimistic  way  to  preserve  order  is  to  force  an 
operation  to  wait  until  the  operations  prior  to  it  complete,  as  for  locking. 
Above,  we  mentioned  the  ISIS  CBCAST  primitive.  CBCAST  maintains 
ordering  information  by  augmenting  the  data  sent  with  a  message.  When 
a  message  m  is  sent  to  a  site,  copies  of  any  messages  m'  that  m  is  causally 
dependent  upon  are  included.  (Of  course,  if  the  site  has  already  received 
ro',  it  need  not  be  sent  a  second  time). 

Another  example  of  such  a  mechanism  is  Psync  [Pet87].  With  this  mech- 
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anism,  the  dependencies  among  messages  are  available  to  the  programmer. 
The  order  information  can  be  carried  as  unique  message  identifiers  rather 
than  by  forwarding  the  whole  message  as  done  in  CBCAST.  It  is  interest¬ 
ing  to  note  that  the  architects  of  Psync  chose  their  model  in  order  to  unify 
other  communication  mechanisms.  They  reasoned  that  it  would  be  rela¬ 
tively  cheap  to  implement  Psync,  and  then  hopefully  build  other  ordering 
mechanisms  cheaply  on  top  of  Psync. 

The  optimistic  schemes  for  preserving  order  depend  on  a  mechanism  for 
detecting  order  violations  and  rolling  back.  We  would  classify  timestamped 
concurrency  control  and  Jefferson’s  work  on  virtual  time  into  this  category. 

7.3  Order  in  Realtime  Systems 

A  realtime  system  is  one  in  which  a  set  of  computer  processes  interact  with 
a  set  of  physical  processes.  In  general,  the  meta-operations  of  the  computer 
processes  must  be  ordered  with  respect  to  external  actions  in  the  physical 
processes.  Since  a  computer  process  can  not  in  general  delay  a  physical 
process,  the  order  must  in  some  part  be  generated  by  the  physical  process. 
This  order  is  most  easily  represented  as  a  total  order  of  events  with  re¬ 
spect  to  some  monotonically  increasing  physical  variable.  The  most  obvious 
candidate  is  the  real  time,  but  any  such  physical  variable  can  be  used. 

Little  of  what  we  discussed  above  can  be  applied  directly  to  realtime 
systems.  For  example,  the  CBCAST  solution  to  the  token  passing  prob¬ 
lem  gains  a  substantial  performance  improvement  by  substituting  a  form 
of  logical  ordering  for  the  total  ordering  provided  by  the  ABCAST  proto¬ 
col.  This  logical  ordering  bears  no  relationship  to  realtime,  and  hence  the 
mechanism  as  presented  above  is  inappropriate  for  use  in  a  realtime  system. 
The  ABCAST  solution,  on  the  order  hand,  could  be  adapted  fairly  easily  to 
a  realtime  setting  (in  fact,  one  could  substitute  Cristian’s  Delta-T  atomic 
broadcast  and  use  the  algorithm  without  additional  changes).  Yet,  some 
realtime  systems  place  demanding  performance  requirements  on  the  proto¬ 
cols  they  use,  and  the  performance  advantages  of  the  CBCAST  solution  in 
the  asynchronous  case  suggest  that  there  might  also  be  benefits  to  using  it 
in  the  realtime  case.  The  question  that  this  raises,  but  which  we  will  leave 
open  here,  is  whether  there  might  exist  some  modified  version  of  CBCAST 
that  could  be  used  to  similar  advantage  in  realtime  environments. 
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8  Conclusions 


Our  field  has  always  searched  for  principles  to  guide  the  development  of 
operating  systems  and  distributed  systems.  We  believe  that  the  principle 
of  distributed  ordering  meets  this  criteria,  namely  that  distributed  systems 
achieve  consistency  through  consistent  distributed  orderings  of  conflicting 
events.  Insights  into  the  fundamental  properties  of  order-based  algorithms 
would  impact  a  wide  range  of  distributed  and  parallel  systems. 

Distributed  computing  has  long  been  characterized  by  intense  interest  in 
performance  and  robustness.  With  the  increasing  focus  on  closely  coupled 
distributed  services,  the  sorts  of  consistency  issues  we  raise  here  are  becom¬ 
ing  widely  relevant.  One  implication  of  a  principle  of  distributed  ordering 
is  that  such  services  will  achieve  the  maximum  performance  and  robustness 
only  through  a  careful  understanding  of  their  ordering  requirements,  and 
through  the  development  of  highly  refined  operating  system  primitives  for 
satisfying  these  requirements. 

Several  directions  suggest  themselves  for  future  study. 

Order-based  operating  system  primitives.  An  important  question  re¬ 
lates  to  how  ordering  mechanisms  should  be  presented  to  applications  pro¬ 
grammers.  Current  systems  offer  a  range  of  high  level  order-based  abstrac¬ 
tions,  such  as  transactions,  quorum  replicated  data,  atomic  multicasts,  and 
virtually  synchronous  process  groups.  One  cannot  help  but  wonder  if  there 
is  a  more  primitive  abstraction  from  which  these  higher  level  mechanisms 
could  be  constructed.  Such  an  abstraction  would  be  particularly  useful  be¬ 
cause  it  could  support  a  variety  of  these  mechanisms  at  once,  while  also 
addressing  the  needs  of  applications  that  have  reason  to  order  other  sorts 
of  operations,  such  as  the  execution  of  pieces  of  code  or  actions  taken  in 
response  to  external  events.  For  example,  it  would  be  possible  to  support 
a  notion  of  ordered  distributed  operation  that  might  come  close  to  directly 
implementing  our  meta-operations,  but  in  which  the  operation  to  perform 
would  be  specified  totally  abstractly.  A  different  approach  might  focus  on 
order  manipulation  primitives,  like  those  in  Psync  [Pet87],  but  augmented 
to  have  a  stronger  notion  of  distributed  event  and  to  reduce  any  dependence 
on  message-passing. 

Order-based  language  primitives?  The  development  of  convenient  lan¬ 
guage  support  for  transactions  has  played  a  major  role  in  making  transac- 
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tional  systems  easier  to  use  and  popular  as  a  distributed  computing  method¬ 
ology.  It  seems  natural  to  ask  if  we  can  devise  effective  language  support  for 
representing  and  manipulating  order.  For  example,  it  would  be  useful  to  ex¬ 
plore  the  possibility  of  supporting  classes  of  ordered  distributed  operations 
in  a  multiple  type  inheritance  framework. 

How  much  order  is  needed?  The  use  of  order  is  closely  tied  to  the  cost 
of  an  application.  Systematic  tools  are  needed  for  for  determining  how  much 
order  an  application  needs,  and  perhaps  for  using  ordering  as  a  complexity 
measure  under  which  different  solutions  to  a  problem  can  be  compared. 

Theory  of  order  composition.  We  noted  in  Section  6  that  a  theory  is 
needed  for  describing  the  manner  in  which  meta-orderings  can  be  composed 
to  obtain  higher  level  orderings.  Such  a  theory  could  be  a  valuable  tool  for 
reducing  complexity  and  improving  correctness  in  higher  level  systems.  It 
might  also  help  us  to  identify  limitations  on  what  can  be  achieved  in  dis¬ 
tributed  systems.  For  example,  we  noted  that  the  external  world  may  be 
an  observer  of  consistency.  We  also  observed  that  there  are  limits  on  the 
degree  to  which  actions  can  be  simultaneous  within  a  distributed  system. 
This  limit  does  not  apply  to  an  external  observer,  hence  one  could  profitably 
ask  what  constraints  an  external  consistency  requirement  places  on  the  im¬ 
plementation  of  a  distributed  algorithm.  The  answer  would  undoubtably  be 
both  deep  and  of  practical  value. 
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To  initiate  a  pass()  or  requestQ: 

ABCAST( * optype  * ) ; 

On  receiving  a  pass()  or  request (): 

case (optype)  of 
‘pass’ : 

if (is. empty (request. queue)) 
wants. request  *  TRUE; 

else 

grant (head(request.queue) ) 

*  request ’ : 

if (wants.request) 

wants.request  =  FALSE;  grant(this.request) ; 

else 

append(request_queue,  this.request) ; 

end; 

Figure  2:  ABCAST  token-passing  algorithm 
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To  initiate  a  pass()  or  request (): 

case(optype)  of 
‘pass ’ : 

while (is. empty (request.queue) )  wait ; 
CBCAST(‘pass> ,  qu.head(request.queue)) ; 
*  request  * : 

CBCAST( *  request ’ ) ; 

end; 


On  receiving  a  passO  or  request () : 

case(optype)  of 
‘pass (granted) ’ : 

grant (dequeue_request(request_queue,  granted) ) 
‘request’ : 

append(request_queue,  this.request) ; 

end; 


Figure  3:  CBCAST  token-passing  algorithm 
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